Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
现实世界图像Denoising是一个实用的图像恢复问题,旨在从野外嘈杂的输入中获取干净的图像。最近,Vision Transformer(VIT)表现出强大的捕获远程依赖性的能力,许多研究人员试图将VIT应用于图像DeNosing任务。但是,现实世界的图像是一个孤立的框架,它使VIT构建了内部贴片的远程依赖性,该依赖性将图像分为贴片并混乱噪声模式和梯度连续性。在本文中,我们建议通过使用连续的小波滑动转换器来解决此问题,该小波滑动转换器在现实世界中构建频率对应关系,称为dnswin。具体而言,我们首先使用CNN编码器从嘈杂的输入图像中提取底部功能。 DNSWIN的关键是将高频和低频信息与功能和构建频率依赖性分开。为此,我们提出了小波滑动窗口变压器,该变压器利用离散的小波变换,自我注意力和逆离散小波变换来提取深度特征。最后,我们使用CNN解码器将深度特征重建为DeNo的图像。对现实世界的基准测试的定量和定性评估都表明,拟议的DNSWIN对最新方法的表现良好。
translated by 谷歌翻译
实体图像超分辨率旨在将现实世界的低分辨率图像恢复到其高质量版本中。典型的RealSR框架通常包括针对不同图像属性设计的多个标准的优化,通过隐含的假设,即基地图像可以在不同标准之间提供良好的权衡。但是,由于不同图像属性之间固有的对比关系,因此在实践中很容易违反该假设。对比学习(CL)提供了一种有希望的食谱,可以通过使用三重态对比损失学习判别特征来缓解此问题。尽管CL在许多计算机视觉任务中取得了重大成功,但由于在这种情况下很难定义有效的阳性图像对,因此将CL引入REALSR是不平凡的。受到观察的启发,即标准之间也可能存在对比的关系,在这项工作中,我们提出了一种新颖的室友训练范式,称为标准比较学习(CRIA-CL),通过开发根据标准而不是图像贴片定义的对比损失。此外,提出了一个空间投影仪,以便在Realsr中获得CRIA-CL的良好视图。我们的实验表明,与典型的加权回归策略相比,我们的方法在相似的参数设置下取得了重大改进。
translated by 谷歌翻译
Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the development of techniques mitigating these stereotypical associations. However, as recent research has pointed out, the current benchmarks lack a robust experimental setup, consequently hindering the inference of meaningful conclusions from their evaluation metrics. In this paper, we extend these arguments and demonstrate that existing techniques and benchmarks aiming to measure stereotypes tend to be inaccurate and consist of a high degree of experimental noise that severely limits the knowledge we can gain from benchmarking language models based on them. Accordingly, we propose a new framework for robustly measuring and quantifying biases exhibited by generative language models. Finally, we use this framework to investigate GPT-3's occupational gender bias and propose prompting techniques for mitigating these biases without the need for fine-tuning.
translated by 谷歌翻译
情感双对提取(ECPE)是情感原因分析中的一项新任务,它从情感文档中提取潜在的情感因子对。最近的研究使用端到端方法来应对ECPE任务。但是,这些方法要么患有标签稀疏问题,要么无法模拟情绪与原因之间的复杂关系。此外,他们都不考虑条款的明确语义信息。为此,我们将ECPE任务转换为文档级机器阅读理解(MRC)任务,并提出了具有重新INK机制(MM-R)的多转移MRC框架。我们的框架可以模拟情绪和原因之间的复杂关系,同时避免产生配对矩阵(标签稀疏问题的主要原因)。此外,多转弯结构可以融合情绪和原因之间的明确语义信息流。关于基准情绪的广泛实验导致语料库证明了我们提出的框架的有效性,该框架的表现优于现有的最新方法。
translated by 谷歌翻译
持续的机器阅读理解旨在逐步从连续的数据流中逐步学习,而无需访问先前的可见数据,这对于实际开发现实世界MRC系统至关重要。但是,在不忘记以前的知识的情况下,逐步学习新领域是一个巨大的挑战。在本文中,提出了MA-MRC,这是一个连续的MRC模型,具有不确定性感知的固定记忆和对抗域的适应性。在MA-MRC中,固定尺寸内存将少数样本存储在先前的域数据中,以及新域数据到达时不确定性的更新策略。对于增量学习,MA-MRC不仅通过学习记忆和新域数据来保持稳定的理解,而且还可以通过对抗性学习策略充分利用它们之间的域适应关系。实验结果表明,MA-MRC优于强基础,并且具有实质性的递增学习能力,而没有灾难性地忘记在两个不同的持续MRC设置下。
translated by 谷歌翻译
人类翻译的文本以同一语言显示出与自然书面文本的不同特征。这种现象被称为翻译人员,被认为是将机器翻译(MT)评估混淆。但是,我们发现现有的翻译工作忽略了一些重要因素,结论主要是相关的,但不是因果关系。在这项工作中,我们收集了Causalmt,这是一个数据集,其中MT培训数据还标有人类翻译方向。我们检查了两个关键因素,即火车测试方向匹配(是否对齐训练和测试集中的人类翻译方向)和数据模型方向匹配(该模型是否沿与人类翻译方向相同的方向学习数据集)。我们表明,这两个因素对MT的性能具有很大的因果影响,除了测试模型方向不匹配的情况下,现有工作对TranslationEse的影响强调了。鉴于我们的发现,我们为MT培训和评估提供了一系列建议。我们的代码和数据在https://github.com/edisonni-hku/causalmt上
translated by 谷歌翻译
Reasoning is central to human intelligence. However, fallacious arguments are common, and some exacerbate problems such as spreading misinformation about climate change. In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate). Detecting logical fallacies is a hard problem as the model must understand the underlying logical structure of the argument. We find that existing pretrained large language models perform poorly on this task. In contrast, we show that a simple structure-aware classifier outperforms the best language model by 5.46% on Logic and 4.51% on LogicClimate. We encourage future work to explore this task as (a) it can serve as a new reasoning challenge for language models, and (b) it can have potential applications in tackling the spread of misinformation. Our dataset and code are available at https://github.com/causalNLP/logical-fallacy
translated by 谷歌翻译
文本样式传输是自然语言生成中的重要任务,旨在控制生成的文本中的某些属性,例如礼貌,情感,幽默和许多其他特性。它在自然语言处理领域拥有悠久的历史,最近由于深神经模型带来的有希望的性能而重大关注。在本文中,我们对神经文本转移的研究进行了系统调查,自2017年首次神经文本转移工作以来跨越100多个代表文章。我们讨论了任务制定,现有数据集和子任务,评估,以及丰富的方法在存在并行和非平行数据存在下。我们还提供关于这项任务未来发展的各种重要主题的讨论。我们的策据纸张列表在https://github.com/zhijing-jin/text_style_transfer_survey
translated by 谷歌翻译
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TEXTFOOLER, a simple but strong baseline to generate adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate three advantages of this framework:(1) effective-it outperforms previous attacks by success rate and perturbation rate, (2) utility-preserving-it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient-it generates adversarial text with computational complexity linear to the text length. 1
translated by 谷歌翻译